132 research outputs found

    All that glitters...: Interannotator agreement in natural language processing

    Get PDF
    Evaluation has emerged as a central concern in natural language processing (NLP) over the last few decades. Evaluation is done against a gold standard, a manually linguistically annotated dataset, which is assumed to provide the ground truth against which the accuracy of the NLP system can be assessed automatically. In this article, some methodological questions in connection with the creation of gold standard datasets are discussed, in particular (non-)expectations of linguistic expertise in annotators and the interannotator agreement measure standardly but unreflectedly used as a kind of quality index of NLP gold standards

    Estimating language relationships from a parallel corpus. A study of the Europarl corpus

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 161-167. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    Dialect classification in the Himalayas: a computational approach

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 307-310. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/1695

    All in the Family: A Comparison of SALDO and WordNet

    Get PDF
    Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), 7-12. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209

    Synchronic and Diachronic Aspects of Kanashi

    Get PDF
    Kanashi is a Sino-Tibetan language belonging to the West Himalayish subbranch of this language family. It is spoken by fewer than 2,000 individuals in one single village (Malana in Kullu district, Himachal Pradesh state, India). The book presents an overview of synchronic and diachronic aspects of Kanashi: its sound system, its grammar in outline, its intriguing numeral systems, and word lists (English-Kanashi, Kanashi-English)

    Synchronic and Diachronic Aspects of Kanashi

    Get PDF
    Kanashi is a Sino-Tibetan language belonging to the West Himalayish subbranch of this language family. It is spoken by fewer than 2,000 individuals in one single village (Malana in Kullu district, Himachal Pradesh state, India). The book presents an overview of synchronic and diachronic aspects of Kanashi: its sound system, its grammar in outline, its intriguing numeral systems, and word lists (English-Kanashi, Kanashi-English)

    Semantic search in literature as an e-Humanities research tool: CONPLISIT — Consumption patterns and life-style in 19th century Swedish literature

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 58-65. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    Editor: Donncha O Croinin

    Get PDF
    Abstract Finnish Romani is a language with a fairly recent written tradition; for all practical purposes it is a 20th century phenomenon. An official orthography was created in 1971, and it is mostly from the 1970's onwards that we see texts of the kind which we normally associate with a written language variety. The text corpus described here is being compiled to support an ongoing investigation into the effects of language contact on Finnish Romani
    corecore